• Àüü
  • ÀüÀÚ/Àü±â
  • Åë½Å
  • ÄÄÇ»ÅÍ
´Ý±â

»çÀÌÆ®¸Ê

Loading..

Please wait....

±¹³» ³í¹®Áö

Ȩ Ȩ > ¿¬±¸¹®Çå > ±¹³» ³í¹®Áö > Çѱ¹Á¤º¸°úÇÐȸ ³í¹®Áö > Á¤º¸°úÇÐȸ ÄÄÇ»ÆÃÀÇ ½ÇÁ¦ ³í¹®Áö (KIISE Transactions on Computing Practices)

Á¤º¸°úÇÐȸ ÄÄÇ»ÆÃÀÇ ½ÇÁ¦ ³í¹®Áö (KIISE Transactions on Computing Practices)

Current Result Document :

ÇѱÛÁ¦¸ñ(Korean Title) »ý¼ºÀû Àû´ë ½Å°æ¸Á°ú µ¥ÀÌÅÍ È®ÀåÀ» ÀÌ¿ëÇÑ µö·¯´× ±â¹Ý TTS À½Áú °³¼±
¿µ¹®Á¦¸ñ(English Title) Fidelity Enhancement for Deep Learning-based TTS using a Generative Adversarial Network and Data Augmentation
ÀúÀÚ(Author) ÃÖ Áø   ¾çÁøÇõ   ±èÀÎÁß   Jin Choi   Jinhyeok Yang   Injung Kim  
¿ø¹®¼ö·Ïó(Citation) VOL 26 NO. 05 PP. 0256 ~ 0260 (2020. 05)
Çѱ۳»¿ë
(Korean Abstract)
º» ³í¹®¿¡¼­´Â »ý¼ºÀû Àû´ë ½Å°æ¸ÁÀ» ÀÌ¿ëÇØ µö·¯´× ±â¹Ý TTS ¸ðµ¨ÀÌ ÇÕ¼ºÇÑ ¸á ½ºÆåÆ®·Î±×·¥À» ½ÇÁ¦ À½¼ºÀÇ ¸á ½ºÆåÆ®·Î±×·¥°ú À¯»çÇØÁöµµ·Ï °³¼±ÇÏ´Â µö·¯´× ¸ðµ¨ TE-GAN(TTS Enhancement GAN)À» ¼Ò°³ÇÑ´Ù. TE-GANÀº À½¼º ½ÅÈ£ÀÇ Æ¯¼ºÀ» °í·ÁÇØ ¼³°èµÇ¾úÀ¸¸ç, ±×¸®ÇÉ-¸² ¾Ë°í¸®Áò°ú °°Àº °£´ÜÇÑ º¸ÄÚ´õ¿Í °áÇյǾ À½Áú °³¼± È¿°ú°¡ ¿ì¼öÇÏ´Ù. Ãß°¡ÀûÀ¸·Î TE-GANÀÇ È¿°úÀûÀÎ ÇнÀÀ» À§ÇØ ½Ã°£Àû ´ÙÁß ¿¡ÀÌÀüÆ®(temporal multi-agent, TMA)¿¡ ÀÇÇÑ µ¥ÀÌÅÍ È®Àå ¹æ¹ýÀ» Á¦¾ÈÇÑ´Ù. ½ÇÇèÀ» ÅëÇØ Á¦¾ÈÇÏ´Â ¹æ¹ýµéÀÌ TTS ½Ã½ºÅÛÀÌ ÇÕ¼ºÇÑ À½¼ºÀÇ À½ÁúÀ» Å©°Ô °³¼±ÇÒ ¼ö ÀÖÀ½À» º¸¿´´Ù. ½ÇÇè¿¡¼­ TE-GANÀº Tacotron ÀÌ ÇÕ¼ºÇÑ ¸á ½ºÆåÆ®·³À» ½ÇÁ¦ À½¼ºÀÇ ¸á ½ºÆåÆ®·³°ú À¯»çÇϵµ·Ï °³¼±ÇÏ¿´À¸¸ç, ÇÕ¼ºµÈ À½¼ºÀÇ MOSµµ 2.07¿¡¼­ MOS°¡ 3.24·Î Å©°Ô °³¼±µÇ¾ú´Ù.
¿µ¹®³»¿ë
(English Abstract)
In this paper, we introduce TE-GAN (TTS enhancement GAN) a deep learning model that enhances the Mel-spectrogram synthesized by a deep learning-based TTS model to be similar to that of human speech using a generative adversarial network. TE-GAN was designed by considering the characteristics of speech signals, and can significantly improve the fidelity of speech signals even when it is combined with a simple vocoder such as the Griffin-Lim algorithm. Additionally, we present a data augmentation technique using a Temporal Multi-Agent (TMA) approach for effective learning. Experimental results demonstrate that the proposed methods significantly improve the fidelity of the speech signals synthesized by the TTS system. In experiments, TE-GAN improved the Mel-spectrogram of Tacotron to make it more similar to the Mel-spectrogram of human speech, on top of this the MOS of synthesized speech was improved significantly from 2.07 to 3.24
Å°¿öµå(Keyword) µö·¯´×   À½¼ºÇÕ¼º   »ý¼ºÀû Àû´ë ½Å°æ¸Á   µ¥ÀÌÅÍ È®Àå   deep learning   speech synthesis   generative adversarial network   TTS À½Áú °³¼±   data augmentation   TTS fidelity enhancement  
ÆÄÀÏ÷ºÎ PDF ´Ù¿î·Îµå